1 Introduction

Moving to a new city on a tight budget is challenging. Especially, a metropolis like London has high rents and a competitive market that makes it difficult to find accommodation at a reasonable price with just the attributes that you are looking for. Sharing economy services like Airbnb have faciliated the search for a nice room rented out by private agent. The rooms and apartments available are furnished for the user to settle right in. But how do you know if the price you are paying for your flat is acutally a fair price?

Profit of both hosts and the platform itself have skyrocketed in the past years. A typical UK host earns around 3,000 Pound a year (Cox, 2017a). It is certain that profit comes from the user that is paying both the fee of the platform and the profit margin of the host out of his own pocket. If you are on a tight budget yourself you want to pick a price that is market average with the attributes important to you. This paper aims at creating a model to forecast the price a user pay will pay per night for an Airbnb matching his requirements to faciliate the check whether the price of the apartment is indeed the fair price.

2 Description of the dataset

The dataset for this investigation covers all Airbnb offerings in London as per the 4th and 5th of March 2017. It contains 53.904 objects for 95 different variables. Its source is the website “Inside Airbnb - Adding data to the debate” (Cox, 2017b). This in an independent and non-commercial project aimed to examine the effect of Airbnb activities on urban development.

To allow this investigation to be more focused, on its actual goal of helping students to find the right place for their desired Airbnb, the initial dataset was processed by some selections and filters. For example, only apartments with a private room and at least three valid ratings were included. The resulting dataset has 7.020 objects for 78 variables left and shall be described in the following.

2.1 Price

Table 1: Summary of Price Variable
Min Q1 Median Mean Q3 Max
8 35 45 50.06994 59 590

With price beeing the dependant variable of our investigation, it can be considered as the most important. When looking at the summary statistics for the price one may quickly find that 75 % of all Airbnbs are priced at 59 Pounds per night or less. However, there are some severe outliers that range up to a maximum of 590 Pounds.

This leaves in doubt, whether the price follows a normal distriubtion which would be desirable for a later linear regression. In fact, by from plotting the price as to the left side below no normal distribution can be found. However, the plot to the right hand side shows, when using a logarithmic scale on the price it looks almost normally distributed.

Figure 1: Density of Price and Log10 of Price

Figure 1: Density of Price and Log10 of Price

2.2 Rent

With London beeing one of the most expensive cities to live, rent prices can be considered as the major cost of providing an Airbnb. Therefore, we would like to observe the relationship between rent and AirBnB price. However, the initial dataset holds no information on the regular rent price at the location of an Airbnb. Searching for the big property websites such as Rightmove or Zoopla, we found one website called “Find Properly” (see Lokku Ltd., 2017), which utilizes the data from Zoopla and provide the rent and selling price for each region by 217 zipcode, from BR1 to WD25. Using the zipcode, we were able to map the average weekly rent for 1 bed properties to every Airbnb.

Figure 2: Mapping Rent Prices vs. Airbnb Prices

Figure 2: Mapping Rent Prices vs. Airbnb Prices

Mapping the mean rent and the logarithmic Airbnb prices according to their location some of the expected relationship. Nevertheless, it also becomes clear that there is more to an Airbnb price than just the average rent in the particular neighbourhood.

2.3 Location

Table 1: Summary of Price Variable
P-Value Conf Low Estimate Conf High
cor 4.802688e-260 -0.414006 -0.3944327 -0.3744948

When choosing an Airbnb in London, people may consider its location since location decides the convenience to travel around or live in London. In our model, we use the distance to the touristic city center - Picadilly Circus - as a measurement of the Airbnb location. It was calculated by using the Haversine formula and the geographic coordinates of Picadilly Circus (Longitude: -0.133869, Latitude: 51.510067).

From the boxplot and correlation test above, the relationship between distance to the city center and price is significantly negatively correlated. Statistically speaking, the closer to the city center, the higher the price.

Figure 3: Mapping Rent Prices vs. Airbnb Prices

Figure 3: Mapping Rent Prices vs. Airbnb Prices

2.4 Reviews

Additionally to the written reviews, guests can give their hosts star-ratings on the following parameters (see Airbnb Inc., 2017): Overall experience, accuracy, cleanliness, communication, check in, location and value. Overall experience relates to the general impression of the guest and is only calculated for ads with at least three reviews. Accuracy asks how well the ad represented the real properties of the apartment. Cleanliness accounts for tidyness of the flat. Check in and communication both are service-based: Was communication with the host before and during the stay sufficient and was the check in process smooth or difficult? The location is evaluated based on security, comfort and attractiveness of the neighbourhood. Finally, value is a subjective measure to define whether the guests believe that the apartment is worth the price paid - an interesting measure for our analysis.

While the guest gives his ratings on a one-to-five-star scale, the data set transforms this data to a rating from 1 to 10, for the overall rating from 0 to 100. In the table below, the average of reviews is very high: At either 9 or 10 for the subrating scores and at 92 for the overall score. Reviews start at values 2 or 4 for the subcategories and 20 for the overall rating. This means, that ads with good ratings are overrepresented suggesting ads with bad reviews will be unlikely to be booked and, therefore, removed from the website. As the overall score is individually picked, different subcategories have different effects on the overall rating. Overall score is only moderately correlated to location, communication and cleanliness. Accuracy, check in and value are strongly correlated to the points received in overall rating. Transferring these findings to the analysis implies a higher impact of those variables on the model and shows the necessity to analyse both subcategories and overall rating score as they are given independently. The relation between the different rating scores and price is relatively weak. For none of the categories there is even a weak correlation to price.

Table 1: Summary of Price Variable
Name Minimum Maximum Mean Correlation_Rating Correlation_Price
Accuracy 2 10 9 0.77 0.09
Check In 2 10 10 0.78 0.14
Cleanliness 2 10 9 0.67 0.10
Communication 4 10 10 0.68 0.10
Location 3 10 9 0.54 0.30
Value 2 10 9 0.79 0.04
Overall 20 100 92 1.00 0.13

2.5 Property Characteristics and Amenities

In AirBnB, landlords offer different amenities for the tenants, including Wifi, kitchen, washer, even shampoo, and etc, and those facilities will be showed in the webpages for tenants’ references. Out of 53,844 offerings of rooms in London, internet connections are the most important elements in room. There are 97% of the rooms offering Wifi and internet connection. Besides, some general amenities for houses such as heating facilities (96%), kitchens (91%), essentials (85%), washers (81%), shampoo(62%), hangers (61%), and Iron (56%), will often be provided in houses/rooms. However, in London, most of the offerings don’t allow their customers to smoke and to bring their pets with them. 77% of rooms are with smoke detectors, and only 10% of rooms are smoking allowed and pets allowed. Due to the weather in London, air-con facility seems not important, but this will be a must in most of the Asian countries.

ggplot(data_remove_amen,aes(x =reorder(amen_name, -number),y=number))+geom_col()+theme(axis.text = element_text(angle= 90, hjust = 1))+labs(x = "amenity",y="%")
library(plyr)
rating<-list("number_of_reviews","review_scores_rating","review_scores_value")
rating1<-unlist(rating)
amen_list<-list(amen_name)
amen_list1<-unlist(amen_list)
data.cor<-select(data_short,contains("amen_"),price,number_of_reviews,review_scores_rating,review_scores_value,review_scores_cleanliness)%>%filter(!is.na(number_of_reviews),!is.na(review_scores_rating),!is.na(review_scores_value),!is.na(review_scores_cleanliness))
data_amen1<-rename(data.cor,c("amen_24-hour_check-in"="amen_24_hour_check_in","amen_Washer_/_Dryer"="amen_Washer_Dryer","amen_Family/kid_friendly"="amen_Family_kid_friendly","amen_Buzzer/wireless_intercom"="amen_Buzzer_wireless_intercom","amen_Cat(s)"="amen_Cat","amen_Dog(s)"="amen_Dog","amen_Other_pet(s)"="amen_Other_pet","amen_Self_Check-In"="amen_Self_Check_In"))
data_amen<-filter(data_amen1,price<100)
data_amen2<-filter(data_amen1,review_scores_rating>=80)
data_amen3<-filter(data_amen1,review_scores_value>=8)
data_amen4<-filter(data_amen1,review_scores_cleanliness>=8)
data_amen5<-filter(data_amen1,number_of_reviews<=50)

2.5.1 Key ameniteis to influence the price

There are around 49 offers of amenities in the AirBnB in London. Out of 49 amenities offerings, we found 7 amenities may influence the price, including some home essentials such as kitchens, TVs, dryers, and washers, facilities like elevators or whether it’s a family-kid friendly environment as well as whether it provides lock on the bedroom door. The price of the accommodation with TVs, elevators, dryers and washers is higher than those don’t, especially for TV, which may lead to higher overall rating and perceived value from customers (Please refer to the plot between the amenity and rating below. Since most of the ratings are above 80 or 8, we do filter the rooms with rating more than 80 or 8 to see the difference. People tend to rate the room among higher range, mostly above 90 or 9. Therefore, some drop from 10 to 9 should already be significant.) However, it seems the market doesn’t value those accommodations with family-kid friendly environment and kitchen. Their prices are slightly lower than those without those amenities. Probably, those amenities linked to more work and noisy. Another interesting finding is the room without the lock can may have higher price than others, which may be reasoned that the room with lock may mainly in more unsafe regions.

ggplot(data_amen,aes(x=amen_Family_kid_friendly,y=price))+geom_boxplot()
ggplot(data_amen,aes(x=amen_TV,y=price))+geom_boxplot()
ggplot(data_amen,aes(x=amen_Elevator_in_building,y=price))+geom_boxplot()
ggplot(data_amen,aes(x=amen_Dryer,y=price))+geom_boxplot()
ggplot(data_amen,aes(x=amen_Kitchen,y=price))+geom_boxplot()
ggplot(data_amen,aes(x=amen_Washer,y=price))+geom_boxplot()
ggplot(data_amen,aes(x=amen_Lock_on_bedroom_door,y=price))+geom_boxplot()
ggplot(data_amen2,aes(x=amen_TV,y=review_scores_rating))+geom_boxplot()
ggplot(data_amen2,aes(x=amen_Elevator_in_building,y=review_scores_rating))+geom_boxplot()
ggplot(data_amen2,aes(x=amen_Family_kid_friendly,y=review_scores_rating))+geom_boxplot()
ggplot(data_amen2,aes(x=amen_Dryer,y=review_scores_rating))+geom_boxplot()
ggplot(data_amen2,aes(x=amen_Kitchen,y=review_scores_rating))+geom_boxplot()
ggplot(data_amen2,aes(x=amen_Washer,y=review_scores_rating))+geom_boxplot()
ggplot(data_amen2,aes(x=amen_Lock_on_bedroom_door,y=review_scores_rating))+geom_boxplot()
ggplot(data_amen3,aes(x=amen_TV,y=review_scores_value))+geom_boxplot()
  • Property Characteristics: Some general information on the property such as the room type, the number of people that can be accommodated or the number of bathrooms.
  • Amenities: On top of the characteristics, Airbnb contains information on a wide range of amenities for every flat. These range from the availability of Internet and a TV up to a personal doorman or a pool. We introduced dummy variables for 52 different amenities as well as a variable counting the total number of amenities.
  • Offering Characteristics: Lastly some information on the Cancellation Policy or whether the Airbnb is instantly bookable was included.

Furthermore, the Cartesian coordinates were calculated, using the instructions of Irawan (2014), to plot the data on maps provided by Lovelace & Cheshire (2014).

Bibliography

Airbnb Inc. (2017) How do star ratings work. [Online]. Available from: https://de.airbnb.com/help/article/1257/how-do-star-ratings-work.

Cox, J. (2017a) Airbnb: Surge in uk hosts over past year boosts local economies. The Independent. [Online] Available from: http://www.independent.co.uk/news/business/news/airbnb-hosts-uk-surge-boost-local-economies-online-holiday-rental-london-southwest-northern-ireland-a7940451.html.

Cox, M. (2017b) Inside airbnb - adding data to the debate. [Online]. Available from: http://data.insideairbnb.com/united-kingdom/england/london/2017-03-04/data/listings.csv.gz.

Irawan, D.E. (2014) How to convert lat-long coordinates to utm. [Online]. Available from: https://rpubs.com/dasaptaerwin/19879.

Lokku Ltd. (2017) London house prices by postcode. [Online]. Available from: https://www.findproperly.co.uk/london/postcode/#.WdvonHeZNn4.

Lovelace, R. & Cheshire, J. (2014) Introduction to visualising spatial data in R. National Centre for Research Methods Working Papers. [Online] 14 (03). Available from: https://github.com/Robinlovelace/Creating-maps-in-R.

Appendix

Column Numbers Name Description
1 price Price per Nighty as offered on Airbnb
2 zip_first First half of the London Zipcode
3 mean_rent Mean Rent for the given Zipcode as per SOURCE
4 distance Distance from Picadilly Circus in km
5 - 6 east & north Geographic Cartesian coordinates required for map plotting
7 - 13 review_scores Average customer reviews from Airbnb
14 number_of_reviews Number of customer reviews
15 property_type
16 room_type
17 accommodates
18 bathrooms
19 bedrooms
20 beds
21 amenities_count
22 - 74 amen Dummy Variables for the various amenities
75 minimum_nights
76 instant_bookable
77 cancellation_policy

Imperial College Business School